Overview

Dataset statistics

Number of variables15
Number of observations16281
Missing cells0
Missing cells (%)0.0%
Duplicate rows5
Duplicate rows (%)< 0.1%
Total size in memory1.9 MiB
Average record size in memory120.0 B

Variable types

NUM13
BOOL2

Warnings

Dataset has 5 (< 0.1%) duplicate rows Duplicates
workclass has 683 (4.2%) zeros Zeros
education has 2670 (16.4%) zeros Zeros
marital-status has 5434 (33.4%) zeros Zeros
occupation has 1841 (11.3%) zeros Zeros
relationship has 4278 (26.3%) zeros Zeros
race has 13946 (85.7%) zeros Zeros
capital-gain has 14958 (91.9%) zeros Zeros
capital-loss has 15518 (95.3%) zeros Zeros
native-country has 14662 (90.1%) zeros Zeros

Reproduction

Analysis started2021-02-01 08:44:03.435559
Analysis finished2021-02-01 08:44:48.634597
Duration45.2 seconds
Software versionpandas-profiling v2.9.0
Download configurationconfig.yaml

Variables

age
Real number (ℝ≥0)

Distinct73
Distinct (%)0.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean38.767459
Minimum17
Maximum90
Zeros0
Zeros (%)0.0%
Memory size127.2 KiB
2021-02-01T09:44:48.883931image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum17
5-th percentile19
Q128
median37
Q348
95-th percentile64
Maximum90
Range73
Interquartile range (IQR)20

Descriptive statistics

Standard deviation13.84918681
Coefficient of variation (CV)0.3572374143
Kurtosis-0.2205809761
Mean38.767459
Median Absolute Deviation (MAD)10
Skewness0.5545794063
Sum631173
Variance191.7999754
MonotocityNot monotonic
2021-02-01T09:44:49.153675image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
354612.8%
 
334602.8%
 
234522.8%
 
364502.8%
 
384372.7%
 
314372.7%
 
414272.6%
 
324252.6%
 
374222.6%
 
304172.6%
 
Other values (63)1189373.0%
 
ValueCountFrequency (%) 
172001.2%
 
183121.9%
 
193412.1%
 
203602.2%
 
213762.3%
 
ValueCountFrequency (%) 
90120.1%
 
892< 0.1%
 
883< 0.1%
 
872< 0.1%
 
852< 0.1%
 

workclass
Real number (ℝ≥0)

ZEROS

Distinct9
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2.315029789
Minimum0
Maximum8
Zeros683
Zeros (%)4.2%
Memory size127.2 KiB
2021-02-01T09:44:49.430718image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile1
Q12
median2
Q32
95-th percentile5
Maximum8
Range8
Interquartile range (IQR)0

Descriptive statistics

Standard deviation1.246499083
Coefficient of variation (CV)0.5384375997
Kurtosis1.975612924
Mean2.315029789
Median Absolute Deviation (MAD)0
Skewness1.338417779
Sum37691
Variance1.553759964
MonotocityNot monotonic
2021-02-01T09:44:49.614664image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=9)
ValueCountFrequency (%) 
21121068.9%
 
113218.1%
 
410436.4%
 
59635.9%
 
06834.2%
 
65793.6%
 
34722.9%
 
77< 0.1%
 
83< 0.1%
 
ValueCountFrequency (%) 
06834.2%
 
113218.1%
 
21121068.9%
 
34722.9%
 
410436.4%
 
ValueCountFrequency (%) 
83< 0.1%
 
77< 0.1%
 
65793.6%
 
59635.9%
 
410436.4%
 

fnlwgt
Real number (ℝ≥0)

Distinct12787
Distinct (%)78.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean189435.6778
Minimum13492
Maximum1490400
Zeros0
Zeros (%)0.0%
Memory size127.2 KiB
2021-02-01T09:44:49.847005image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum13492
5-th percentile40641
Q1116736
median177831
Q3238384
95-th percentile378922
Maximum1490400
Range1476908
Interquartile range (IQR)121648

Descriptive statistics

Standard deviation105714.9077
Coefficient of variation (CV)0.5580517298
Kurtosis5.739969535
Mean189435.6778
Median Absolute Deviation (MAD)60904
Skewness1.422954131
Sum3084202270
Variance1.11756417e+10
MonotocityNot monotonic
2021-02-01T09:44:50.086670image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
13698690.1%
 
1902908< 0.1%
 
1258928< 0.1%
 
2034888< 0.1%
 
1276518< 0.1%
 
1173108< 0.1%
 
1202778< 0.1%
 
1115677< 0.1%
 
485207< 0.1%
 
1265697< 0.1%
 
Other values (12777)1620399.5%
 
ValueCountFrequency (%) 
134921< 0.1%
 
137692< 0.1%
 
138621< 0.1%
 
193021< 0.1%
 
194101< 0.1%
 
ValueCountFrequency (%) 
14904001< 0.1%
 
12105041< 0.1%
 
11177181< 0.1%
 
10478221< 0.1%
 
10245351< 0.1%
 

education
Real number (ℝ≥0)

ZEROS

Distinct16
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3.386954118
Minimum0
Maximum15
Zeros2670
Zeros (%)16.4%
Memory size127.2 KiB
2021-02-01T09:44:50.315104image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q11
median2
Q35
95-th percentile11
Maximum15
Range15
Interquartile range (IQR)4

Descriptive statistics

Standard deviation3.440724713
Coefficient of variation (CV)1.015875797
Kurtosis1.277991889
Mean3.386954118
Median Absolute Deviation (MAD)2
Skewness1.273732435
Sum55143
Variance11.83858655
MonotocityNot monotonic
2021-02-01T09:44:50.494107image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=16)
ValueCountFrequency (%) 
1528332.4%
 
5358722.0%
 
0267016.4%
 
39345.7%
 
76794.2%
 
26373.9%
 
65343.3%
 
124562.8%
 
83091.9%
 
102581.6%
 
Other values (6)9345.7%
 
ValueCountFrequency (%) 
0267016.4%
 
1528332.4%
 
26373.9%
 
39345.7%
 
42421.5%
 
ValueCountFrequency (%) 
152241.4%
 
14320.2%
 
13790.5%
 
124562.8%
 
111761.1%
 

education-num
Real number (ℝ≥0)

Distinct16
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean10.07290707
Minimum1
Maximum16
Zeros0
Zeros (%)0.0%
Memory size127.2 KiB
2021-02-01T09:44:50.681084image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile5
Q19
median10
Q312
95-th percentile14
Maximum16
Range15
Interquartile range (IQR)3

Descriptive statistics

Standard deviation2.567545259
Coefficient of variation (CV)0.2548961527
Kurtosis0.6308129327
Mean10.07290707
Median Absolute Deviation (MAD)1
Skewness-0.3263376289
Sum163997
Variance6.592288655
MonotocityNot monotonic
2021-02-01T09:44:50.860913image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=16)
ValueCountFrequency (%) 
9528332.4%
 
10358722.0%
 
13267016.4%
 
149345.7%
 
116794.2%
 
76373.9%
 
125343.3%
 
64562.8%
 
43091.9%
 
152581.6%
 
Other values (6)9345.7%
 
ValueCountFrequency (%) 
1320.2%
 
2790.5%
 
31761.1%
 
43091.9%
 
52421.5%
 
ValueCountFrequency (%) 
161811.1%
 
152581.6%
 
149345.7%
 
13267016.4%
 
125343.3%
 

marital-status
Real number (ℝ≥0)

ZEROS

Distinct7
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1.084270008
Minimum0
Maximum6
Zeros5434
Zeros (%)33.4%
Memory size127.2 KiB
2021-02-01T09:44:51.035610image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median1
Q31
95-th percentile4
Maximum6
Range6
Interquartile range (IQR)1

Descriptive statistics

Standard deviation1.269621951
Coefficient of variation (CV)1.170946297
Kurtosis5.471059299
Mean1.084270008
Median Absolute Deviation (MAD)1
Skewness2.159702477
Sum17653
Variance1.611939899
MonotocityNot monotonic
2021-02-01T09:44:51.191600image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=7)
ValueCountFrequency (%) 
1740345.5%
 
0543433.4%
 
2219013.5%
 
65253.2%
 
45053.1%
 
32101.3%
 
5140.1%
 
ValueCountFrequency (%) 
0543433.4%
 
1740345.5%
 
2219013.5%
 
32101.3%
 
45053.1%
 
ValueCountFrequency (%) 
65253.2%
 
5140.1%
 
45053.1%
 
32101.3%
 
2219013.5%
 

occupation
Real number (ℝ≥0)

ZEROS

Distinct15
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean4.73115902
Minimum0
Maximum14
Zeros1841
Zeros (%)11.3%
Memory size127.2 KiB
2021-02-01T09:44:51.364045image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q12
median4
Q37
95-th percentile11
Maximum14
Range14
Interquartile range (IQR)5

Descriptive statistics

Standard deviation3.42594777
Coefficient of variation (CV)0.7241244176
Kurtosis-0.6229911147
Mean4.73115902
Median Absolute Deviation (MAD)2
Skewness0.4640893416
Sum77028
Variance11.73711812
MonotocityNot monotonic
2021-02-01T09:44:51.541366image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=15)
ValueCountFrequency (%) 
3203212.5%
 
1202012.4%
 
6201312.4%
 
5185411.4%
 
0184111.3%
 
4162810.0%
 
910206.3%
 
119665.9%
 
77584.7%
 
27024.3%
 
Other values (5)14478.9%
 
ValueCountFrequency (%) 
0184111.3%
 
1202012.4%
 
27024.3%
 
3203212.5%
 
4162810.0%
 
ValueCountFrequency (%) 
14930.6%
 
136< 0.1%
 
123342.1%
 
119665.9%
 
105183.2%
 

relationship
Real number (ℝ≥0)

ZEROS

Distinct6
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1.531171304
Minimum0
Maximum5
Zeros4278
Zeros (%)26.3%
Memory size127.2 KiB
2021-02-01T09:44:51.721523image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median1
Q33
95-th percentile4
Maximum5
Range5
Interquartile range (IQR)3

Descriptive statistics

Standard deviation1.445369429
Coefficient of variation (CV)0.9439632426
Kurtosis-0.5484959242
Mean1.531171304
Median Absolute Deviation (MAD)1
Skewness0.7911428132
Sum24929
Variance2.089092786
MonotocityNot monotonic
2021-02-01T09:44:51.885714image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=6)
ValueCountFrequency (%) 
1652340.1%
 
0427826.3%
 
3251315.4%
 
4167910.3%
 
27634.7%
 
55253.2%
 
ValueCountFrequency (%) 
0427826.3%
 
1652340.1%
 
27634.7%
 
3251315.4%
 
4167910.3%
 
ValueCountFrequency (%) 
55253.2%
 
4167910.3%
 
3251315.4%
 
27634.7%
 
1652340.1%
 

race
Real number (ℝ≥0)

ZEROS

Distinct5
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.2173085191
Minimum0
Maximum4
Zeros13946
Zeros (%)85.7%
Memory size127.2 KiB
2021-02-01T09:44:52.055768image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile1
Maximum4
Range4
Interquartile range (IQR)0

Descriptive statistics

Standard deviation0.6222315817
Coefficient of variation (CV)2.863355676
Kurtosis14.47767592
Mean0.2173085191
Median Absolute Deviation (MAD)0
Skewness3.584743389
Sum3538
Variance0.3871721412
MonotocityNot monotonic
2021-02-01T09:44:52.226227image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=5)
ValueCountFrequency (%) 
01394685.7%
 
115619.6%
 
24802.9%
 
31591.0%
 
41350.8%
 
ValueCountFrequency (%) 
01394685.7%
 
115619.6%
 
24802.9%
 
31591.0%
 
41350.8%
 
ValueCountFrequency (%) 
41350.8%
 
31591.0%
 
24802.9%
 
115619.6%
 
01394685.7%
 

sex
Boolean

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size127.2 KiB
0
10860 
1
5421 
ValueCountFrequency (%) 
01086066.7%
 
1542133.3%
 
2021-02-01T09:44:52.371153image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

capital-gain
Real number (ℝ≥0)

ZEROS

Distinct113
Distinct (%)0.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1081.905104
Minimum0
Maximum99999
Zeros14958
Zeros (%)91.9%
Memory size127.2 KiB
2021-02-01T09:44:52.538459image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile4865
Maximum99999
Range99999
Interquartile range (IQR)0

Descriptive statistics

Standard deviation7583.935968
Coefficient of variation (CV)7.009797753
Kurtosis148.6715143
Mean1081.905104
Median Absolute Deviation (MAD)0
Skewness11.77829263
Sum17614497
Variance57516084.77
MonotocityNot monotonic
2021-02-01T09:44:52.800337image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
01495891.9%
 
150241661.0%
 
76881260.8%
 
72981180.7%
 
99999850.5%
 
3103550.3%
 
5178490.3%
 
5013480.3%
 
4386380.2%
 
3325280.2%
 
Other values (103)6103.7%
 
ValueCountFrequency (%) 
01495891.9%
 
1142< 0.1%
 
4013< 0.1%
 
594180.1%
 
9142< 0.1%
 
ValueCountFrequency (%) 
99999850.5%
 
413101< 0.1%
 
340951< 0.1%
 
27828240.1%
 
252363< 0.1%
 

capital-loss
Real number (ℝ≥0)

ZEROS

Distinct82
Distinct (%)0.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean87.89926909
Minimum0
Maximum3770
Zeros15518
Zeros (%)95.3%
Memory size127.2 KiB
2021-02-01T09:44:53.037424image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile0
Maximum3770
Range3770
Interquartile range (IQR)0

Descriptive statistics

Standard deviation403.1052856
Coefficient of variation (CV)4.585991326
Kurtosis19.29718195
Mean87.89926909
Median Absolute Deviation (MAD)0
Skewness4.52064718
Sum1431088
Variance162493.8713
MonotocityNot monotonic
2021-02-01T09:44:53.282308image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
01551895.3%
 
19021020.6%
 
1977850.5%
 
1887740.5%
 
2415230.1%
 
1590220.1%
 
1876200.1%
 
1741200.1%
 
1485200.1%
 
1564180.1%
 
Other values (72)3792.3%
 
ValueCountFrequency (%) 
01551895.3%
 
2131< 0.1%
 
3232< 0.1%
 
6255< 0.1%
 
6531< 0.1%
 
ValueCountFrequency (%) 
37702< 0.1%
 
31752< 0.1%
 
30043< 0.1%
 
28244< 0.1%
 
26032< 0.1%
 

hours-per-week
Real number (ℝ≥0)

Distinct89
Distinct (%)0.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean40.39223635
Minimum1
Maximum99
Zeros0
Zeros (%)0.0%
Memory size127.2 KiB
2021-02-01T09:44:53.563886image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile16
Q140
median40
Q345
95-th percentile60
Maximum99
Range98
Interquartile range (IQR)5

Descriptive statistics

Standard deviation12.47933225
Coefficient of variation (CV)0.3089537341
Kurtosis3.016899125
Mean40.39223635
Median Absolute Deviation (MAD)3
Skewness0.2604188513
Sum657626
Variance155.7337333
MonotocityNot monotonic
2021-02-01T09:44:53.817531image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
40758646.6%
 
5014278.8%
 
458935.5%
 
607024.3%
 
356403.9%
 
206383.9%
 
305513.4%
 
553572.2%
 
252841.7%
 
482531.6%
 
Other values (79)295018.1%
 
ValueCountFrequency (%) 
17< 0.1%
 
2210.1%
 
3200.1%
 
4300.2%
 
5350.2%
 
ValueCountFrequency (%) 
99520.3%
 
983< 0.1%
 
964< 0.1%
 
922< 0.1%
 
90130.1%
 

native-country
Real number (ℝ≥0)

ZEROS

Distinct41
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1.241139979
Minimum0
Maximum40
Zeros14662
Zeros (%)90.1%
Memory size127.2 KiB
2021-02-01T09:44:54.437843image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile7
Maximum40
Range40
Interquartile range (IQR)0

Descriptive statistics

Standard deviation4.941918703
Coefficient of variation (CV)3.981757728
Kurtosis26.39634926
Mean1.241139979
Median Absolute Deviation (MAD)0
Skewness4.983488573
Sum20207
Variance24.42256047
MonotocityNot monotonic
2021-02-01T09:44:54.675943image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=41)
ValueCountFrequency (%) 
01466290.1%
 
53081.9%
 
42741.7%
 
13970.6%
 
7700.4%
 
11690.4%
 
10610.4%
 
3510.3%
 
25490.3%
 
28470.3%
 
Other values (31)5933.6%
 
ValueCountFrequency (%) 
01466290.1%
 
1430.3%
 
2250.2%
 
3510.3%
 
42741.7%
 
ValueCountFrequency (%) 
406< 0.1%
 
39130.1%
 
38100.1%
 
37190.1%
 
36150.1%
 

income
Boolean

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size127.2 KiB
0
12435 
1
3846 
ValueCountFrequency (%) 
01243576.4%
 
1384623.6%
 
2021-02-01T09:44:54.844516image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Interactions

2021-02-01T09:44:05.589728image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T09:44:05.870084image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T09:44:06.116825image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T09:44:06.367046image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T09:44:06.616403image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T09:44:06.838979image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T09:44:07.052289image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T09:44:07.269842image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T09:44:07.489040image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T09:44:07.712740image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T09:44:07.922250image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T09:44:08.150309image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T09:44:08.376743image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T09:44:08.616387image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T09:44:08.841112image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T09:44:09.064659image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T09:44:09.287397image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T09:44:09.509454image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T09:44:09.732291image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T09:44:09.950472image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T09:44:10.168834image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T09:44:10.376400image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T09:44:10.601141image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T09:44:10.819069image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T09:44:11.041904image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T09:44:11.252745image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T09:44:11.472145image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T09:44:11.696017image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T09:44:11.921023image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T09:44:12.141391image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T09:44:12.451189image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T09:44:12.681804image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T09:44:12.898759image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T09:44:13.137265image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T09:44:13.347257image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T09:44:13.566776image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T09:44:13.777250image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T09:44:14.002453image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T09:44:14.205121image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T09:44:14.431608image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T09:44:14.654025image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T09:44:14.891792image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T09:44:15.110768image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T09:44:15.330898image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T09:44:15.551157image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T09:44:15.761416image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T09:44:15.986121image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T09:44:16.191212image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T09:44:16.419975image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T09:44:16.632003image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T09:44:16.928110image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T09:44:17.149455image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T09:44:17.387424image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T09:44:17.643179image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T09:44:17.885702image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T09:44:18.111675image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T09:44:19.080671image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T09:44:19.304080image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T09:44:19.537077image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T09:44:19.766104image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T09:44:19.994993image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T09:44:20.218287image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T09:44:20.441423image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T09:44:20.668424image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T09:44:20.895019image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T09:44:21.115468image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T09:44:21.319142image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T09:44:21.546244image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T09:44:21.836105image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T09:44:22.061564image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T09:44:22.271099image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T09:44:22.529996image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T09:44:22.970638image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T09:44:23.205955image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T09:44:23.478466image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T09:44:23.800418image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T09:44:24.026142image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T09:44:24.227168image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T09:44:24.446776image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T09:44:24.662958image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T09:44:24.877599image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T09:44:25.100476image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T09:44:25.313675image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T09:44:25.527080image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T09:44:25.734049image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T09:44:25.947909image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T09:44:26.152892image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T09:44:26.375446image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T09:44:26.594691image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T09:44:26.824581image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T09:44:27.035786image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T09:44:27.259790image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T09:44:27.471848image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T09:44:27.680307image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T09:44:27.891385image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T09:44:28.103408image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T09:44:28.303863image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T09:44:28.510302image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T09:44:28.915391image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T09:44:29.109880image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T09:44:29.311960image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T09:44:29.505054image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T09:44:29.707627image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T09:44:29.900972image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T09:44:30.143951image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T09:44:30.399603image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T09:44:30.680610image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T09:44:30.932474image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T09:44:31.215839image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T09:44:31.476237image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T09:44:31.741715image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T09:44:32.023579image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T09:44:32.284088image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T09:44:32.561994image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T09:44:32.817755image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T09:44:33.073906image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T09:44:33.317253image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T09:44:33.569992image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T09:44:33.807650image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T09:44:34.041902image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T09:44:34.271040image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T09:44:34.514855image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T09:44:34.749672image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T09:44:35.047849image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T09:44:35.287198image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T09:44:35.515435image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T09:44:35.767562image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T09:44:35.997349image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T09:44:36.245756image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T09:44:36.491587image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T09:44:36.742379image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T09:44:36.996005image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T09:44:37.242590image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T09:44:37.493625image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T09:44:37.796962image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T09:44:38.048251image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T09:44:38.288281image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T09:44:38.542674image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T09:44:38.784821image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T09:44:39.027996image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T09:44:39.286920image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T09:44:39.540188image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T09:44:39.777950image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T09:44:40.028285image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T09:44:40.262168image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T09:44:40.487294image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T09:44:40.725233image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T09:44:40.948350image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T09:44:41.181478image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T09:44:41.695041image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T09:44:41.930028image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T09:44:42.150061image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T09:44:42.382044image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T09:44:42.607228image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T09:44:42.840069image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T09:44:43.082036image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T09:44:43.335435image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T09:44:43.604107image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T09:44:43.863381image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T09:44:44.120190image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T09:44:44.380203image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T09:44:44.655487image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T09:44:44.924549image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T09:44:45.217575image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T09:44:45.509968image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T09:44:45.855976image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T09:44:46.206916image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T09:44:46.466262image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T09:44:46.773597image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Correlations

2021-02-01T09:44:54.975680image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2021-02-01T09:44:55.339538image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2021-02-01T09:44:55.690062image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2021-02-01T09:44:56.053269image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

2021-02-01T09:44:47.473844image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-01T09:44:48.305296image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Sample

First rows

ageworkclassfnlwgteducationeducation-nummarital-statusoccupationrelationshipracesexcapital-gaincapital-losshours-per-weeknative-countryincome
02522268022709310004000
1382898141918100005000
2284336951612112100004001
344216032351019110768804001
4185103497510011301003000
534219869312604000003000
629522702619011410004000
7631104626101513100310303201
824236966751004401004000
95521049968416100001000

Last rows

ageworkclassfnlwgteducationeducation-nummarital-statusoccupationrelationshipracesexcapital-gaincapital-losshours-per-weeknative-countryincome
16271612896861915100004800
162723124401291916100004000
162732523509771904301004000
1627448434923031424000004000
1627533224521101303300004000
1627639221541901323001003600
1627764532140319611510004000
1627838237498301313100005000
162794428389101320320545504000
1628035618214801311100006001

Duplicate rows

Most frequent

ageworkclassfnlwgteducationeducation-nummarital-statusoccupationrelationshipracesexcapital-gaincapital-losshours-per-weeknative-countryincomecount
0186378036158083000010002
1242194630013030000035002
229236440013000010040002
3302180317711290000040002
437252870013111000040002